XML schema clustering with semantic and hierarchical similarity measures

نویسندگان

  • Richi Nayak
  • Wina Iryadi
چکیده

With the growing popularity of XML as the data representation language, collections of XML data have exploded in numbers. The methods are required to manage and discover the useful information from them for improved document handling. We present a schema clustering process by organising heterogeneous XML schemas into groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structure similarity. We support our findings with experiments and analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی فراابتکاری اسناد فارسی اِکس‌اِم‌اِل مبتنی بر شباهت ساختاری و محتوایی

Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...

متن کامل

A semantic similarity analysis for data mappings between heterogeneous XML schemas

One of the most critical steps to integrating heterogeneous e-Business applications using different XML schemas is schema mapping, which is known to be costly and error-prone. Past research on schema mapping has not made full use of semantic information imbedded in the hierarchical structure of the XML schema. In this chapter, we investigate the existing schema mapping approaches and propose an...

متن کامل

A novel method for measuring semantic similarity for XML schema matching

Enterprises integration has recently gained great attentions, as never before. The paper deals with an essential activity enabling seamless enterprises integration, that is, a similarity-based schema matching. To this end, we present a supervised approach to measure semantic similarity between XML schema documents, and, more importantly, address a novel approach to augment reliably labeled trai...

متن کامل

A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. Therefore it is a new challenge for the field of data mining to turn these documents into a more useful information utility. We present a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according ...

متن کامل

Schema Conversion Methods between XML and Relational Models

In this chapter, three semantics-based schema conversion methods are presented: 1) CPI converts an XML schema to a relational schema while preserving semantic constraints of the original XML schema, 2) NeT derives a nested structured XML schema from a flat relational schema by repeatedly applying the nest operator so that the resulting XML schema becomes hierarchical, and 3) CoT takes a relatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2007